Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CentML AI Inference Provider Integration #810

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

V2arK
Copy link

@V2arK V2arK commented Jan 17, 2025

What does this PR do?

Add CentML as a Remote Inference Provider in llama-stack.

This PR integrates CentML into llama-stack, enabling users to utilize CentML's models (meta-llama/Llama-3.3-70B-Instruct and meta-llama/Llama-3.1-405B-Instruct-FP8) for inference tasks like chat and text completions.

Right now only supported for conda deployments, simply build with llama stack build --template centml --image-type conda and run with llama stack run run.yaml --port <PORT> --env CENTML_API_KEY=<API_KEY>, then use llama-stack-client to perform any inference workload as needed.

Key Changes:

  • Added CentML as a remote inference provider with model supports for meta-llama/Llama-3.3-70B-Instruct and meta-llama/Llama-3.1-405B-Instruct-FP8.

Addresses issue #809


Test Plan

The integration tests is only ran against the dev cluster becuase of model's availability. Plus the completion endpoints and lobprobs / tool-calling ability is not yet implemented on centml end, the tests are failing so currently there is no point to alter it. Please see the following manual testing on the integration tests for more details.

  • Manual Testing:

    • Changed to development cluster for testing with llama_3b (production cluster only have 70b and 405b models).
    • right now only support for basic chat completion and models.
    • no support for tool-calling / completion / logprobs.

Passed test:

llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[llama_3b-centml] PASSED
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[llama_3b-centml] PASSED
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[llama_3b-centml] PASSED

Reproduction Instructions:

(P.S: the tests will only run on developement cluster because of model availability (which is this PR is not set to))

  1. change URL to point to CentML development cluster
  2. Run tests on dev cluster, result is the one above:
    pytest -s -v --providers inference=centml llama_stack/providers/tests/inference/test_text_inference.py  -m "llama_3b" --env CENTML_API_KEY=<API_KEY>
  3. Perform inference:
    image

Sources

  • Related Issue: #809


Before submitting

@facebook-github-bot
Copy link

Hi @V2arK!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 17, 2025
@V2arK V2arK marked this pull request as ready for review January 21, 2025 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants